Computing Translation Units and Quantifying Parallelism in Parallel Dependency Treebanks

نویسنده

  • Matthias Buch-Kromann
چکیده

The linguistic quality of a parallel treebank depends crucially on the parallelism between the source and target language annotations. We propose a linguistic notion of translation units and a quantitative measure of parallelism for parallel dependency treebanks, and demonstrate how the proposed translation units and parallelism measure can be used to compute transfer rules, spot annotation errors, and compare different annotation schemes with respect to each other. The proposal is evaluated on the 100,000 word Copenhagen Danish-English Dependency Treebank.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Treebanks in Machine Translation

We present an approach using treebanks in machine translation. Our experiment in Czech-English machine translation is an attempt to develop a full machine translation system based on dependency trees (Dependency Based Machine Translation, DBMT). We use the following resources: Prague Dependency Treebank, a newly created Czech-English parallel corpus of Penn Treebank, English monolingual corpus,...

متن کامل

Divergences in English-Hindi Parallel Dependency Treebanks

We present, here, our analysis of systematic divergences in parallel EnglishHindi dependency treebanks based on the Computational Paninian Grammar (CPG) framework. Study of structural divergences in parallel treebanks not only helps in developing larger treebanks automatically, but can also be useful for many NLP applications such as data-driven machine translation (MT) systems. Given that the ...

متن کامل

A Dependency Treebank of Classical Chinese Poems

As interest grows in the use of linguistically annotated corpora in research and teaching of foreign languages and literature, treebanks of various historical texts have been developed. We introduce the first large-scale dependency treebank for Classical Chinese literature. Derived from the Stanford dependency types, it consists of over 32K characters drawn from a collection of poems written in...

متن کامل

Cross Language Dependency Parsing using a Bilingual Lexicon

This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language. A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation. Using an ensemble method, the key information extracted from word pairs with dependency ...

متن کامل

Integrating Data and Task Parallelism in Scientific Programs

Functional languages attract the attention of developers of parallelizing compilers because of the implicit parallelism of functional programs and the simplified data dependence analysis of functional statements. A major drawback of functional languages is that naive translation of functional programs results in code that requires excessive memory. In this paper we explore the connection betwee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007